| Question 2 (18 points): For a benchmark program executing in the Nindle e-reader 20% of the               |
|-----------------------------------------------------------------------------------------------------------|
| instructions are load/store, $50\%$ of the instructions are ALU operations and $30\%$ of the instructions |
| are branches. On average load/store instructions take 10 cycles to execute, ALU instructions              |
| execute in 1 cycle and branch instructions take 3 cycles to execute. The clock frequency for this         |
| processor is 4 GHz (1 GHz = $10^9$ Hz). This benchmark takes 20 seconds to execute.                       |

a. (4 points) What is the average number of clocks per instruction (CPI) for this benchmark?

b. (5 points) How many instructions are executed by this benchmark?

c. (6 points) A revision of the architecture for the Nindle processor adds a new level to the memory hierarchy and thus reduces the average execution time of each load/store instruction to 5 cycles. Also an improvement to the compiler reduces the number of load/store instructions required to execute this benchmark by half. How much time does it take to execute the same benchmark in this revised Nindle processor?

d. (3 points) How much faster is this benchmark in the improved Nindle (with both the revised architecture and the improved compiler) in comparison with the original Nindle?